Knowledge Annotation and Visualization

Summarizing Knowledge using Network Analysis

Similarly to how the Clinical Knowledge Graph (CKG) can be used to annotate a list of proteins based on their connections, CKG can also annotate a list of drugs. CKG generates a comprehensive graph with all the connections to Diseases, Drugs, Protein targets, Protein Complexes, Pathways and Side effects.

All the connections extracted from CKG are then summarized into a smaller subgraph containing only the top nodes of each type (Disease, Durg, Complex, Pathway, Side effects, Publications) based on different network analysis algorithms (centrality, pagerank).

The connections extracted from the graph are:

  • Drug-drug interactions

  • Drug-disease indications

  • Drug-target associations

  • Target-disease associations

  • Targe-complex association

  • Drug-pathway annotations

  • Target-pathway annotations

  • Protein-publication mentions

  • Disease-publication mentions

These connections are extracted using these queries: report_manager/queries/knowledge_annotation.yml and can be easily extended following the same query format.

Here, we show several examples of how to extract and visualize knowledge for a list of drugs.

[1]:
import pandas as pd
from ckg.report_manager import knowledge
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:

The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.

WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.

Annotation of Proteins Linked to a Specific Disease

We use the Open Targets platform https://www.targetvalidation.org/ to obtain lists of genes associated to Fibromyalgia. Open Targets compiled a list of 57 proteins targets that are associated to Fibromyalgia (https://www.targetvalidation.org/disease/EFO_0005687/associations?fcts=datatype:known_drug).

Fibrimyalgia is a medical condition characterized by chronic widespread pain and a heightened pain response to pressure. Other symptoms include tiredness to a degree that normal activities are affected, sleep problems and troubles with memory (source: https://en.wikipedia.org/wiki/Fibromyalgia).

We feed the list of proteins to CKG to prioritize all the knowledge gathered in the graph to reveal relationships to other possibly related diseases as well as possible treatments and altered biological processes and pathways.

[2]:
drug_list = ['Cefadroxil','Bronopol',
             'Pyritinol','Menadione',
             'Idarubicin','Proscillaridin',
             'PF-04691502', 'Sulisobenzone',
             'Tolnaftate', 'Uracil mustard',
             'Racecadotril', 'Atracurium besylate',
             'Galantamine', 'Sulfanitran',
             'Hydroquinidine', 'Thiamine',
             'Levofloxacin', 'Gefitinib']

Knowledge Object

To annotate the list of proteins, we create an empty object of type Knowledge.

Once we have the object, we can simply call the function annotate_list() specifying the list of proteins and in this case the disease (or diseases) and what type of entities we want to annotate (Disease, Drug, Pathway, etc.).

[3]:
#Create Knowledge object
kn = knowledge.Knowledge(identifier='List_of_drugs', data=None)
[4]:
# Annotate the list of proteins using function annotate_list
kn.annotate_list(query_list=drug_list, # list of proteins
                 entity_type='drug', # type of items in the list
                 queries_file=None, # Allows YML file with customized queries or the default (None)
                 attribute='name',  # What we provide in the list (name, id)
                 diseases=[], # List of diseases
                 entities=None) # what types of annotations (Disease, Drug, Pathway, etc.)
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


This function runs all the queries in queries_file (default: report_manager/queries/knowledge_annotation.yml) associated to the entity_type (protein) and limits the queried information to relationships to the list of proteins provided.

Summarization and Visualization

The graph contains millions of relationships and the results from the annotation may be too combersome.

In order to summarize the results and make them easier to understand and navigate, CKG uses network analysis algorithms (centrality (betweenness, closeness) and pagerank) to prioritize the nodes in the knowledge annotation graph.

The result summarizes the relationships of the top 15 nodes of each entity type according to these algorithms (Disease, Drug, Pathway, Biological_process, Complex, Publication).

The summarized results can be visualized either as a Sankey plot or as a network.

[5]:
kn.generate_report(visualizations=['network', 'sankey'], # how to visualize the results (network, sankey)
                   summarize=True, # Whether or not to summarize the annotation
                   method='betweenness', # Method for summarizing the annotation (betweenness, closeness, pagerank)
                   inplace=True, # If True, the summarized is saved, otherwise keep full graph
                   num_nodes=20) # Number of top nodes to be used in the visualization (default 15)
[6]:
kn.report.visualize_report(environment='notebook')[0]

All the Knowledge is Accessible

All the relationships extracted from the CKG are stored as a dataframe in the class property data.

[7]:
kn.data.shape
[7]:
(10750, 7)
[8]:
kn.data.head()
[8]:
r.source rel_type source source_type target target_type weight
0 NaN INTERACTS_WITH Atracurium besylate [Drug] Galantamine [Drug] NaN
1 NaN INTERACTS_WITH Atracurium besylate [Drug] Proscillaridin [Drug] NaN
2 NaN INTERACTS_WITH Cefadroxil [Drug] Levofloxacin [Drug] NaN
3 NaN INTERACTS_WITH Galantamine [Drug] Atracurium besylate [Drug] NaN
4 NaN INTERACTS_WITH Galantamine [Drug] Gefitinib [Drug] NaN
[9]:
kn.data.tail()
[9]:
r.source rel_type source source_type target target_type weight
9293 None MENTIONED_IN_PUBLICATION Uracil mustard [Drug] PMID:26048278 [Publication] NaN
9294 None MENTIONED_IN_PUBLICATION Uracil mustard [Drug] PMID:29596642 [Publication] NaN
9295 None MENTIONED_IN_PUBLICATION Uracil mustard [Drug] PMID:19001432 [Publication] NaN
9296 None MENTIONED_IN_PUBLICATION Uracil mustard [Drug] PMID:30845999 [Publication] NaN
9297 None MENTIONED_IN_PUBLICATION Uracil mustard [Drug] PMID:25383193 [Publication] NaN

The generated knowledge subgraph can also be accessed as a NetworkX Directed graph.

[10]:
kn.graph
[10]:
<networkx.classes.digraph.DiGraph at 0x1e5d29cbd48>

And the report can be downloaded to a specified directory. The directory will contain the Sankey visualization in png and svg formats, the network in gml and json formats as well as the nodes and edges (relationships) tables in tsv format.

[11]:
kn.report.download_report('tmp/List_of_drugs')
[ ]: